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VIDEO MOTION ESTIMATION 

This invention relates to video motion estimation 
of the type which can be used in a video coder for 
reducing the bit rate of a digital video signal such as 
5 ■ may be required for storage on digital storage media or 
for broadcast. 

Various proposals for reducing the bit rate of a 
digital video signal have been made such as the MPEG-1 and 
MPEG- 2 algorithms from the Motion Pictures Expert Group 
10 (MPEG) . MPEG-1 is now an international standard and is 

specified in ISO/IEC document IS 11172 parts 1,2 and 3. A 
description of the MPEG-1 video compression algorithm can 
be found in Communications of the ACM April 19 91, Vol.34, 
No. 4 MPEG- 2 is a substantial extension of MPEG-l and is 
15 due for publication as an international standard in 1995. 

Both MPEG algorithms define the pictures 
comprising a video sequence as being one of three types. 
These are as follows: 

A. Intra pictures (I) are coded without reference to 
20 other pictures. They serve as access points to the 

coded video sequence where decoding can begin. 

B. Predicted pictures (P) are coded with reference to 
a motion compensated prediction derived from a 

previous I or P picture. Coding of P pictures is 
25 more efficient than for I pictures. 

C. Bi-directionally predicted pictures (B) are coded 
with reference to a forward motion compensated 

prediction from a previous I or P picture and a 
backward motion compensated prediction from a 
30 future I or P picture and provide the greatest 

degree of compression. 

The order of the three picture types within a 
video sequence is not constrained and will generally 
depend on the requirements of the application. 
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A typical sequence of I , P and B pictures is 
illustrated in Figure 1. Here, frame II is an I picture, 
frames P4 and P7 are P pictures and frames B2 , B3 , B5, B6 , 
B8 and B9 are B pictures. Frame P4 is coded with 

5 reference to a forward prediction derived from frame II. 

Frame P7 is coded with reference to a forward prediction 
derived from frame P4 . Frames B2 and B3 are coded with 
reference to a forward prediction derived from frame II 
and a backward prediction derived from frame P4 . Frames 

10 B5 and B6 are coded with reference to a forward prediction 
derived from frame P4 and a backward prediction derived 
from frame P7 . Frames B8 and B9 are coded with reference 
to a forward prediction derived from frame P7 and a 
backward prediction derived from frame 110. 

15 When a video coder is implemented in hardware, a 

large part of its circuitry is devoted to the measurement 
of motion vectors in order to generate the required motion 
compensated predictions. In the case of B pictures, two 
simultaneous motion measurements are required in order to 

20 generate both the forward and backward predictions. In 

the case of the MPEG- 2 algorithm, twice as many motion 
measurements are required as both 'frame' and 'field' 
versions of each motion vector are needed. 

We have appreciated that the number of motion 

25 measurements, and hence the amount of motion measurement 

circuitry, may be considerably reduced if some additional 
processing of the forward motion vectors is performed. 
Such processing may be implemented in either hardware or 
software . 

30 The invention is defined in the appended claims to 

which reference should now be made. 

Specific embodiments of the invention will now be 

described in detail by way of example with reference to 

the accompanying drawings in which: 
35 Figure 1 shows the sequence of I, B, and P frames 

described above; 

Figure 2 shows a conventional motion estimator 

arrangement required to produce vectors defining the 

sequence of frames shown in Figure 1; and 
40 Figure 3 shows a motion estimator embodying the invention 
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An MPEG picture sequence of the type shown in -Figure 1, 
which is in display order and where I, P and B refer to 
the MPEG picture type is as follows . 

B6, B7, P8, B9, BIO , Pll, B12 , B13, P14 , B15, B16 , 
5 117, B18, B19, P20 etc 

The sequence may be split into "triples" of {BBP} 
or (less often) {BBl} , i.e. two interpolated and one 
reference picture. 

For each {BBP} triple, a coder's motion estimator 
10 must generate five motion vectors. For example, in the 
case of the triple {B12, B13 , P14} the five required 
motion vectors are: 

Pll to P14 (Forward prediction of P14 from Pll) 

Pll to B12 (Forward prediction of B12 from Pll) 

15 Pll to B13 (Forward prediction of E13 from Pll) 

P14 to B12 (Backward prediction of B12 from 

P14) 

P14 to B13 (Backward prediction of Bl3 from 

P14)- 

20 For each {BBl} triple a coder's motion estimator must 

generate four motion vectors. For example, in the case of 
the triple {BIS, B16, 117} the four required motion 
vectors are : 

P14 to B15 (Forward prediction of B15 from P14) 

25 P14 to B16 (Forward prediction of B16 from P14) 

117 to B15 (Backward prediction of B15 from 

117) 

117 to B16 (Backward prediction of B16 from 

117) 

30 To date, the architecture which is generally 

proposed to generate these motion vectors is based around 
two motion estimators as shown in Figure 2. This 
comprises a forward motion estimator 2 receiving a 
previous reference picture and the current picture and 

35 producing forward motion vectors and a backward motion 
vector estimator 4 receiving a future reference picture 
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and the current picture and producing backward motion 
vectors. These are the vectors referred to above. 

Table 1 shown below describes the passage of the t 
sequence, B12 , B13 , P14, B15, B16 , 117, through the motion 

5 estimators. It will be seen that the sequence of the » 
pictures is re-ordered to P14, B12, B13, 117, B15, B16, 
prior to input to the motion estimators in accordance with 
the MPEG algorithm. Each column of the table represents 
an instant in time. For example, the first column means, 

10 "when P14 is on the current picture input, Pll is applied 
to the previous reference picture input and the forward 
motion vector output generates the motion vector Pll to 
P14 n . The table demonstrates how the two motion 
estimators of Figure l can together generate the required 

15 five motion vectors for each {BBP} triple and the required 
four motion vectors for each {BBl} triple. 

Table 1 





Current Picture Input 


P14 


B12 


B13 


117 


B15 


B16 


20 


Previous Reference 
Picture Input 


Pll 


Pll 


Pll 


P14 


P14 


P14 




Future Reference 
Picture Input 




P14 


P14 




117 


117 


25 


Forward Motion 
Vector Output 


Pll 

to 

P14 


Pll 

to 

B12 


Pll 

to 

B13 




P14 

to 

B15 


P14 

to 

B16 


30 


Backward Motion 
Vector Output 




P14 

to 

B12 


P14 

to 

B13 




117 

to 
B15 


117 

to 

B16 



The amount of hardware required for the video coder 
can be reduced if only a single one of the motion 
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estimators of Figure 2 is used. This becomes feasible if 
further vector processing is used to generate the 
additional motion vectors. An arrangement for a single 
motion vector estimator is shown in Figure 3. Table 2 
5 illustrates the passage of the re-ordered sequence P14 , 
B12, B13, 117, B15, B16 through the motion estimator of 
Figure 3 . 

Table 2 

Current Picture Input P14 B12 B13 117 B15 

10 B16 

Reference Picture Input Pll Pll Pll P14 P14 

P14 

Motion Vector Generated Pll Pll Pll P14 P14 

P14 

15 to to to to toto 

P14 B12 B13 117 B15 

B16 

The required motion vectors for the {BBP } triple are 
(Pll to P14), (Pll to B12) , (Pll to B13) , (P14 to B12) and 
20 (P14 to B13). The motion estimator has generated the 

first three of these. The missing two motion vectors may 
be calculated using additional vector processing as 
follows: 

(P14 to B12) = (Pll to B12) - (Pll to P14) 
25 (P14 to B13) = (Pll to B13) - (Pll to P14) 

Similarly, the required motion vectors for the (BBI) 
triple are (P14 to B15), (P14 to B16) , (117 to B15) and 
(117 to B16) . The motion estimator has generated the 
first two of these as well as the vector (P17 to 117) . 
30 The missing two motions vectors may be calculated as 
follows : 

(117 to B15) = (P14 to B15) - (P14 to 117) 
(117 to B16) = (P14 to.B16) - (P14 to 117) 
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As long as the forward motion estimator generates 
the correct forward motion vectors, then the derived 
backward motion vectors are correct. This holds for 
objects moving with linear and non- linear motion which 
represents the majority of picture material. the method 
can break down if the image contains repetitive patterning 
(e.g. a chess board). In such a case, the forward motion 
estimator may identify a number of motion vectors that 
will yield equally good predictions even though they need 
not represent the actual motion of the object. If the 
forward motion vector is incorrect, the derived backward 
motion vector may yield a poor prediction. (It should be 
noted that if a prediction resulting from a backward 
motion vector is poor, the coder will simply choose some 
other prediction mode for that particular macroblock and 
the erroneous backward motion vector will go unnoticed) . 

The additional vector processing required to 
generate all forward and backward vectors using a single 
vector estimator could be implemented in hardware using 
vector adders and/or subtractors and some suitable storage 
and switching arrangement to supply the vectors to the 
adders at the correct times. Alternatively the processing 
could be carried out in software. 

The current MPEG- 2 test model details a further 
improved bit-rate reduction standard comprising field and 
frame motion vectors. The idea is as follows. 

MPEG-2 defines a macroblock as covering an area of 
picture 16 pixels by 16 frame lines, 8 lines originating 
from the odd field and the other 8 lines originating from 
the even field. The job of a coder's prediction generator 
stage is to generate "predictions" of the macrobiocks 
comprising the current frame. If the image is moving then 
these predictions will generally be realised by motion 
compensating (i.e. assigning one or more motion vectors to 
each macroblock of a reference frame . 

At present the motion estimators (of the type 
referred to above) generate both 'field' and 'frame' 
motion vectors in order to derive two predictions of each 
macroblock in the current frame:- one based on the field 
motion vectors, the other based on the frame motion 
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vector. The coder chooses the prediction which most 
closely matches the actual macroblock of the current 
f rame. 

In the case of frame motion vectors, a macroblock is 
5 treated as a single 16 x 16 block of a frame. A single 

"frame" motion vector is assigned to describe how a 
prediction of the macroblock may be generated by motion 
compensating a 16 x 16 pixel area of a reference frame. 
In the case of field motion vectors two "field" 

10 motion vectors are used to independently .generate 

predictions of the odd field and even field components of 
a macroblock. i.e. One field motion vector is used to 
predict the 8 odd lines of the macroblock from the odd 
field of a reference picture and the other field motion 

15 vector is used to predict the 8 even field lines from the 
even field of a reference picture - 

To date, the proposals for implementing field and 
frame motion vectors in hardware make use of two 
independent motion estimator blocks, one performing frame 

20 motion vector generation, the other performing field 
motion vector generation. Both motion estimators 
represent the same amount of hardware. This is because, 
although the motion estimator concerned with field motion 
vector generation produces twice as many motion vectors as 

25 the other, it needs only to compare [8X16] blocks of image 

data as opposed to the [16X16] blocks processed by the 
frame motion estimator. 

The motion vectors are used to generate two 
predictions of a macroblock from the current picture, one 

30 generated using the frame motion vector, the other 

generated using the two field motion vectors. It is then 
the job of the coder's "decision module" to decide which 
is the better prediction. 

If the motion seen by the motion estimator in the 

35 odd field is different from that seen in the even field 
then the two field motion vectors will be different and 
the field motion vector based prediction is highly likely 
to be the best. 

If the motion seen by the motion estimator is the 

40 same in both fields then the two field motion vectors will 
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be identical and further, they will probably match the 
frame motion vector. In this case there is unlikely to be 
any significant difference between the two predictions. 
We propose that the frame motion estimator can be 

5 dispensed with. A frame motion vector may be derived 

instead by averaging the two field motion vectors. If the 
two field motion vectors match, then the frame motion 
vector will have the same value resulting in similar frame 
and field based predictions. however, if the two field 

10 motion -vectors are different, the frame motion vector is 
likely to be incorrect and the field based prediction is 
likely to be best. The vector generation is explained in 
more detail below. 

Motion occurring in the odd and even fields of a 

15 previous frame is independently measured and used to 
generate predictions of the add and even fields of a 
future frame, i.e. the odd field of the future frame is 
predicted from the odd field of the previous frame and 
correspondingly with the even field. This one prediction 

20 of a single future frame comprising odd and even fields 
has been generated. 

Next the vectors measured in the odd field are 
averaged with those in the even field to give a set of 
frame motion vectors. A further prediction of the future 

25 frame may now be generated by motion compensating the 

previous frame with the set of frame motion vectors. 

Thus there are two predictions of a single specific 
future frame. One is derived by independently measuring 
the motion occurring in each field of a previous frame and 

30 motion compensating each field accordingly. The other is 
derived by averaging the motion vectors from the two 
fields of a previous frame to yield a set of frame motion 
vectors and using these to compensate the whole of the 
previous frame . 

35 This combination of field vectors can be implemented 

in dedicated hardware or in software in the motion 
estimator. 
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CIAIMS 

l. A method for estimating motion vectors defining 
image displacements between reference images in a 
sequence of digital video images and intermediate images 
5 falling between reference images in the sequence the 

method comprising the steps of estimating motion vectors 
for deriving a prediction of one reference image from 
another in the sequence, estimating motion vectors for 
deriving a prediction of each intermediate image between 

10 two reference images in the sequence from one of -the two 
reference images or their predictions, and from the 
estimated vectors deriving further vectors for deriving a 
prediction of each intermediate image from the other one 
of the said two reference images or their predictions 

15 which it falls between. 

2 . A method according to claim 1 in which the thus 
derived vectors conform with a predetermined bit -rate 
reduced video signal standard. 

3 . A method according to claim 1 in which each image 

20 consists of two or more fields and where motion occurring 
in each field is measured independently in order to 
generate a set of field motion vectors for each field and 
the sets of field motion vectors are combined to produce a 
single set of frame motion vectors, thereby enabling two 

25 predictions of a future image to be made. 

4 . A method for video motion estimation between images 
in a sequence in which each image consists of two or more 
fields and where motion occurring in each field is 
measured independently in order to generate a set of field 

30 motion vectors for each field and the sets of field motion 
vectors are combined to produce a set of frame motion 
vectors thereby enabling two predictions of a future image 
to be made . 

5 . A method for estimating motion vectors defining 

35 image displacements between reference frames in a sequence 
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of digital video images and intermediate frames falling 
.between frames in the sequence substantially as herein 
described. 

6. A method for video motion estimation substantially 
5 as herein described. 
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